The SPARTACUS-Database: a Spanish Sentence Database for Offline Handwriting Recognition

نویسندگان

  • Salvador España Boquera
  • María José Castro Bleda
  • José Luis Hidalgo
چکیده

In this paper we describe a database that consists of offline handwritten Spanish sentences from four different subtasks. The database includes 1 500 forms produced by the same number of writers. A total of around 100 000 word instances out of a vocabulary of around 3 300 words occur in the collection. This database is intended to be used for offline handwriting recognition tasks. However, this database is expected to be specially useful for recognition systems that may take advantage of language models of restricted-semantic tasks. The database also includes a few image-processing procedures for extraction of handwritten text images from the forms and segmentation of the images into lines and words.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Rejection Strategies in Handwriting Recognition Systems

This master thesis investigates multiple rejection strategies for offline handwritten sentence recognition. The rejection strategies are implemented as a post-processing step of a Hidden Markov Model based text recognition system, and are based on confidence measures derived from a list of additional candidate sentences produced by the recogniser. Four different reject models are presented and ...

متن کامل

Isolated Persian/Arabic handwriting characters: Derivative projection profile features, implemented on GPUs

For many years, researchers have studied high accuracy methods for recognizing the handwriting and achieved many significant improvements. However, an issue that has rarely been studied is the speed of these methods. Considering the computer hardware limitations, it is necessary for these methods to run in high speed. One of the methods to increase the processing speed is to use the computer pa...

متن کامل

A Full English Sentence Database for Off-line Handwriting Recognition

In this paper we present a new database for off-line handwriting recognition, together with a few preprocessing and text segmentation procedures. The database is based on the Lancaster-Oslo/Bergen(LOB) corpus. This corpus is a collection of texts that were used to generate forms, which subsequently were filled out by persons with their handwriting. Up to now (December 1998) the database include...

متن کامل

Sentence Recognition through Hybrid Neuro-Markovian Modeling

This paper focuses on designing a handwriting recognition system dealing with on-line signal, i.e. temporel handwriting signal captured through an electronic pen or a digitalized tablet. We present here some new results concerning a hybrid on-line handwriting recognition system based on Hidden Markov Models (HMMs) and Neural Networks (NNs), which has already been presented in several contributi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004